Rhea: Automatic Filtering for Unstructured Cloud Storage
نویسندگان
چکیده
Unstructured storage and data processing using platforms such as MapReduce are increasingly popular for their simplicity, scalability, and flexibility. Using elastic cloud storage and computation makes them even more attractive. However cloud providers such as Amazon and Windows Azure separate their storage and compute resources even within the same data center. Transferring data from storage to compute thus uses core data center network bandwidth, which is scarce and oversubscribed. As the data is unstructured, the infrastructure cannot automatically apply selection, projection, or other filtering predicates at the storage layer. The problem is even worse if customers want to use compute resources on one provider but use data stored with other provider(s). The bottleneck is now the WAN link which impacts performance but also incurs egress bandwidth charges. This paper presents Rhea, a system to automatically generate and run storage-side data filters for unstructured and semi-structured data. It uses static analysis of application code to generate filters that are safe, stateless, side effect free, best effort, and transparent to both storage and compute layers. Filters never remove data that is used by the computation. Our evaluation shows that Rhea filters achieve a reduction in data transfer of 2x– 20,000x, which reduces job run times by up to 5x and dollar costs for cross-cloud computations by up to 13x.
منابع مشابه
Comprehensive Analysis of Dense Point Cloud Filtering Algorithm for Eliminating Non-Ground Features
Point cloud and LiDAR Filtering is removing non-ground features from digital surface model (DSM) and reaching the bare earth and DTM extraction. Various methods have been proposed by different researchers to distinguish between ground and non- ground in points cloud and LiDAR data. Most fully automated methods have a common disadvantage, and they are only effective for a particular type of surf...
متن کاملA multi-scale convolutional neural network for automatic cloud and cloud shadow detection from Gaofen-1 images
The reconstruction of the information contaminated by cloud and cloud shadow is an important step in pre-processing of high-resolution satellite images. The cloud and cloud shadow automatic segmentation could be the first step in the process of reconstructing the information contaminated by cloud and cloud shadow. This stage is a remarkable challenge due to the relatively inefficient performanc...
متن کاملAn Efficient Design and Implementation of an MdbULPS in a Cloud-Computing Environment
Flexibly expanding the storage capacity required to process a large amount of rapidly increasing unstructured log data is difficult in a conventional computing environment. In addition, implementing a log processing system providing features that categorize and analyze unstructured log data is extremely difficult. To overcome such limitations, we propose and design a MongoDB-based unstructured ...
متن کاملA Vulnerable Scoring through Code-based Cloud Storage System with Sheltered Data Forwarding
Cloud Storage System has a collection of storage servers provides long-standing storage services over the internet. Data privacy becomes a major concern in cloud storage system because user stores their data in third party cloud system. An encryption scheme available for data privacy but it limits the number of functions done in storage system. Building a secure storage system that supports mul...
متن کاملPresenting a Morphological Based Approach for Filtering The Point Cloud to Extract the Digital Terrain Model
The Digital terrain model is an important geospatial product used as the basis of many practical projects related to geospatial information. Nowadays, a dense point cloud can be generated using the LiDAR data. Actually, the acquired point cloud of the LiDAR, presents a digital surface model that contains ground and non-ground objects. The purpose of this paper is to present a new approach of ex...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2013